An Executable Sequential Specification for Spark Aggregation

نویسندگان

  • Yu-Fang Chen
  • Chih-Duo Hong
  • Ondrej Lengál
  • Shin-Cheng Mu
  • Nishant Sinha
  • Bow-Yaw Wang
چکیده

Spark is a new promising platform for scalable data-parallel computation. It provides several high-level application programming interfaces (APIs) to perform parallel data aggregation. Since execution of parallel aggregation in Spark is inherently non-deterministic, a natural requirement for Spark programs is to give the same result for any execution on the same data set. We present PURESPARK, an executable formal Haskell specification for Spark aggregate combinators. Our specification allows us to deduce the precise condition for deterministic outcomes from Spark aggregation. We report case studies analyzing deterministic outcomes and correctness of Spark programs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

MapReduce is a popular programming paradigm for running largescale data-intensive computation. Recently, many frameworks that implement that paradigm have been developed. To leverage such frameworks, however, developers need to familiarize with each framework’s API and rewrite their code. We present Casper, a new tool that automatically translates sequential Java programs to the MapReduce parad...

متن کامل

Formal Foundations for the Generation of Heterogeneous Executable Specifications in SystemC from UML/MARTE Models

Embedded system heterogeneity leads to the need to understand the system as an aggregation of components in which different behavioural semantics should cohabit. Heterogeneity has two dimensions. On the one hand, during the design process, different execution semantics, specifically in terms of time (untimed, synchronous, timed) can be required in order to provide specific behaviour characteris...

متن کامل

Verifying Interlevel Relations within Multi-Agent Systems: formal theoretical basis

In the general case, at any aggregation level a behavioral specification for a multi-agent system component consists of dynamic properties expressed by complex temporal relations in TTL, which therefore does not allow direct application of automatic verification procedures, more specifically, model checking techniques, used in this paper. In order to apply model checking techniques it is needed...

متن کامل

Automatic Generation of CSP || B Skeletons from xUML Models

CSP ‖ B is a formal approach to specification that combines CSP and B. In this paper we present our tool that automatically translates a subset of executable UML (xUML) models into CSP ‖ B, for the purpose of verification and increased validation at the early stages of a software engineering development lifecycle. The tool is being developed for our industrial collaborators, AWE plc, in order t...

متن کامل

Executable UML and SPARK Ada: The Best of Both Worlds

Executable UML is a well defined UML subset supported by an Action Language that enables the construction of executable models from which reliable target code can be automatically generated. SPARK Ada is a safe Ada subset with formal annotations that renders programs amenable to static analysis and formal verification. This paper describes a hybrid approach where formally annotated Executable U...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017